A Study on the Role of Similarity Measures in Visual Text Analytics
نویسندگان
چکیده
Text Analytics is essential for a large number of applications and good approaches to obtain visual mappings of text are paramount. Many visualization techniques, such as similarity based point placement layouts, have proved useful to support visual analysis of documents. However, they are sensitive to data quality, which, in turn, relies on a critical preprocessing step that involves text ‘cleaning’ and in some cases term detecting and weighting, as well as the definition of a similarity function. There has been limited discussion on the effect of these important similarity calculations in the quality of visual representations. In this work we studied the effect of different text similarity measurements on the quality of visual text mappings. We focused mainly on two types of distance functions, those based on the well-known text vector representation and on direct string comparison measurements, comparing their effect on visual mappings obtained with point placement techniques. We find that both have their value but, in many circumstances, the recently introduced incremental vector space model (iVSM) is the best solution when discrimination is important. Based on the results of our evaluation we offer recommendations on the application of different text similarity measurements for Visual Text Analytics tasks. Keywords-Visual Text Analytics, Visual Text Mining, Vector Space Model, High-dimensional Data Visualization, Multidimensional Projections.
منابع مشابه
Text Analytics of Customers on Twitter: Brand Sentiments in Customer Support
Brand community interactions and online customer support have become major platforms of brand sentiment strengthening and loyalty creation. Rapid brand responses to each customer request though inbound tweets in twitter and taking proper actions to cover the needs of customers are the key elements of positive brand sentiment creation and product or service initiative management in the realm of ...
متن کاملA Geometric View of Similarity Measures in Data Mining
The main objective of data mining is to acquire information from a set of data for prospect applications using a measure. The concerning issue is that one often has to deal with large scale data. Several dimensionality reduction techniques like various feature extraction methods have been developed to resolve the issue. However, the geometric view of the applied measure, as an additional consid...
متن کاملINFORMATION MEASURES BASED TOPSIS METHOD FOR MULTICRITERIA DECISION MAKING PROBLEM IN INTUITIONISTIC FUZZY ENVIRONMENT
In the fuzzy set theory, information measures play a paramount role in several areas such as decision making, pattern recognition etc. In this paper, similarity measure based on cosine function and entropy measures based on logarithmic function for IFSs are proposed. Comparisons of proposed similarity and entropy measures with the existing ones are listed. Numerical results limpidly betoken th...
متن کاملVisual Analytics for Large Document Sets
We examine what we refer to as topic similarity networks: graphs in which nodes represent latent topics in text collections and links represent similarity among topics. Efficient and effective approaches to both building and labeling such networks are described. Visualizations of topic models based on these networks are shown to be a powerful means of exploring, characterizing, and summarizing ...
متن کاملA comparative study of the text inside the Mihrabi rug by Zareh Penyamin and Topkapi Palace Museum according to the existing discourse in the 16th and 19th
IIn the country of Turkey, in the city of Hereke, at the end of the 19th century, rugs known as Mihrabi became popular, which were inspired by the rugs of the Safavid era and kept in the Topkapi Palace Museum. In these rugs, which are reproduced in royal workshops on a large scale, some changes have been made in the verbal text and incorporated visual elements. Among the rugs that seem to have ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013